Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)
نویسندگان
چکیده
The Super Database Computer (SDC) is a highperformance relational database server for a joinintensive environment under development at University of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelieed, and has been employed by many database machines. However, in the presence of data skew, it’s hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelieing strategy. Thus, performance is severly degraded. In this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew. During partitioning relations, each bucket is again divided into fragments of the same sise and these fragments are temporarily placed on PMs one by one. Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected in the corresponding PM. In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited. The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network. Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. the VLDB copyright notice and the title of the publication and its date appear. and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise. or to rcpuhlish. requires a kc and/or special permission from the Endowment. Proceedings of the 16th VLDB Conference Brisbane, Australia 1990 Yasushi Ogawa Research and Development Center, RICOH Co., Ltd. 16-l Shinei-cho, Kohoku-ku,
منابع مشابه
Implementation and Performance Evaluation of the Parallel Relational Database Server SDC-II
This paper presents the implementation and performance evaluation of the SDC-II, the Super Database Computer II. The SDC-II is a highly parallel relational database server, which consists of eight data processing modules interconnected by two networks, where each module contains up to seven processors connected by two busses and four disk drives. The SDCII employs several key techniques to effi...
متن کاملImplementation and Evaluation of the Bucket Flattening Omega Network of the Parallel Relational Database Server
This paper presents the implementation and performance evaluation of the Bucket Flattening Omega Network of the SDC-II, the Super Database Computer II. The SDC-II is a highly parallel relational database server, which consists of eight data processing modules interconnected by two networks. Parallelism in the parallel relational database processing on the shared nothing architecture would suffe...
متن کاملImplementation and Evaluation of the Bucket Flattening Omega Network of the Parallel Relational Database Server SDC-II
This paper presents the implementation and performance evaluation of the Bucket Flattening Omega Network of the SDC-II, the Super Database Computer II. The SDC-II is a highly parallel relational database server, which consists of eight data processing modules interconnected by two networks. Parallelism in the parallel relational database processing on the shared nothing architecture would su er...
متن کاملHandling Data Skew in Multiprocessor Database Computers Using Partition Tuning
Shared nothing multiprocessor archit.ecture is known t.o be more scalable to support very large databases. Compared to other join strategies, a hash-ba9ed join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the data skew problem. Unless the parallel hash join algorithm includes some load balancing mec...
متن کاملAn Improved Hash-based Join Algorithm in the Presence of Double Skew on a Hypercube Computer
This paper presents an improved parallel hash-based join algorithm on a hypercube computer in the presence of double skew. We describe a load balancing technique to evenly distribute both join relations across all processors in order to deal with double skew eeectively. Moreover, we propose a permutation join method which reduces main memory requirement for the local join operation in the previ...
متن کامل